<!DOCTYPE html>
<html>
<!-- Created by GNU Texinfo 7.1.1, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- Copyright © 2006-2023 Free Software Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "Funding Free Software", the Front-Cover
texts being (a) (see below), and with the Back-Cover Texts being (b)
(see below).  A copy of the license is included in the section entitled
"GNU Free Documentation License".

(a) The FSF's Front-Cover Text is:

A GNU Manual

(b) The FSF's Back-Cover Text is:

You have freedom to copy and modify this GNU Manual, like GNU
     software.  Copies published by the Free Software Foundation raise
     funds for GNU development. -->
<title>AMD Radeon (GNU libgomp)</title>

<meta name="description" content="AMD Radeon (GNU libgomp)">
<meta name="keywords" content="AMD Radeon (GNU libgomp)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta name="viewport" content="width=device-width,initial-scale=1">

<link href="index.html" rel="start" title="Top">
<link href="Library-Index.html" rel="index" title="Library Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Offload_002dTarget-Specifics.html" rel="up" title="Offload-Target Specifics">
<link href="nvptx.html" rel="next" title="nvptx">
<style type="text/css">
<!--
a.copiable-link {visibility: hidden; text-decoration: none; line-height: 0em}
span:hover a.copiable-link {visibility: visible}
ul.mark-bullet {list-style-type: disc}
-->
</style>


</head>

<body lang="en">
<div class="section-level-extent" id="AMD-Radeon">
<div class="nav-panel">
<p>
Next: <a href="nvptx.html" accesskey="n" rel="next">nvptx</a>, Up: <a href="Offload_002dTarget-Specifics.html" accesskey="u" rel="up">Offload-Target Specifics</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Library-Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<h3 class="section" id="AMD-Radeon-_0028GCN_0029"><span>12.1 AMD Radeon (GCN)<a class="copiable-link" href="#AMD-Radeon-_0028GCN_0029"> &para;</a></span></h3>

<p>On the hardware side, there is the hierarchy (fine to coarse):
</p><ul class="itemize mark-bullet">
<li>work item (thread)
</li><li>wavefront
</li><li>work group
</li><li>compute unit (CU)
</li></ul>

<p>All OpenMP and OpenACC levels are used, i.e.
</p><ul class="itemize mark-bullet">
<li>OpenMP&rsquo;s simd and OpenACC&rsquo;s vector map to work items (thread)
</li><li>OpenMP&rsquo;s threads (&ldquo;parallel&rdquo;) and OpenACC&rsquo;s workers map
      to wavefronts
</li><li>OpenMP&rsquo;s teams and OpenACC&rsquo;s gang use a threadpool with the
      size of the number of teams or gangs, respectively.
</li></ul>

<p>The used sizes are
</p><ul class="itemize mark-bullet">
<li>Number of teams is the specified <code class="code">num_teams</code> (OpenMP) or
      <code class="code">num_gangs</code> (OpenACC) or otherwise the number of CU. It is limited
      by two times the number of CU.
</li><li>Number of wavefronts is 4 for gfx900 and 16 otherwise;
      <code class="code">num_threads</code> (OpenMP) and <code class="code">num_workers</code> (OpenACC)
      overrides this if smaller.
</li><li>The wavefront has 102 scalars and 64 vectors
</li><li>Number of workitems is always 64
</li><li>The hardware permits maximally 40 workgroups/CU and
      16 wavefronts/workgroup up to a limit of 40 wavefronts in total per CU.
</li><li>80 scalars registers and 24 vector registers in non-kernel functions
      (the chosen procedure-calling API).
</li><li>For the kernel itself: as many as register pressure demands (number of
      teams and number of threads, scaled down if registers are exhausted)
</li></ul>

<p>The implementation remark:
</p><ul class="itemize mark-bullet">
<li>I/O within OpenMP target regions and OpenACC parallel/kernels is supported
      using the C library <code class="code">printf</code> functions and the Fortran
      <code class="code">print</code>/<code class="code">write</code> statements.
</li><li>Reverse offload regions (i.e. <code class="code">target</code> regions with
      <code class="code">device(ancestor:1)</code>) are processed serially per <code class="code">target</code> region
      such that the next reverse offload region is only executed after the previous
      one returned.
</li><li>OpenMP code that has a requires directive with <code class="code">unified_address</code> or
      <code class="code">unified_shared_memory</code> will remove any GCN device from the list of
      available devices (&ldquo;host fallback&rdquo;).
</li><li>The available stack size can be changed using the <code class="code">GCN_STACK_SIZE</code>
      environment variable; the default is 32 kiB per thread.
</li></ul>



</div>
<hr>
<div class="nav-panel">
<p>
Next: <a href="nvptx.html">nvptx</a>, Up: <a href="Offload_002dTarget-Specifics.html">Offload-Target Specifics</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Library-Index.html" title="Index" rel="index">Index</a>]</p>
</div>



</body>
</html>
